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Abstract 

Distributions measured in high energy physics experiments are usually distorted and/or 
transformed by various detector effects. A regularization method for unfolding these 
distributions is re-formulated in terms of the Singular Value Decomposition (SVD) of the 
response matrix. A relatively simple, yet quite efficient unfolding procedure is explained 
in detail. The concise linear algorithm results in a straightforward implementation with 
full error propagation, including the complete covariance matrix and its inverse. Several 
improvements upon widely used procedures are proposed, and recommendations are given 
how to simplify the task by the proper choice of the matrix. Ways of determining the 
optimal value of the regularization parameter are suggested and discussed, and several 
examples illustrating the use of the method are presented. 
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1 Introduction 



In high energy physics, measurements of physical observables — spectra of invariant 
masses, angular distributions of particles, etc. — are usually distorted and transformed 
by various effects such as finite resolution and limited acceptance of the detector. For 
this reason, it is often impossible, or at least very difficult, to make direct comparisons of 
the data obtained using different detectors with each other and with various theoretical 
predictions. 

In order to overcome this problem, some sophisticated studies of the measurement 
process are usually carried out. Ideally, these studies should result in a two- variable 
function describing the response of the detector, so that the actual measured distribution 
can be considered as a convolution of this function with the true one. This in general 
leads to an integral equation for the true distribution. Solving this equation (i.e. unfolding 
the true distribution) usually requires some kind of discretization, leading to a system of 
linear equations. The problem, however, belongs to a class of ill-posed problems, which are 
unstable against small variations in the initial system. Because of inevitable statistical 
errors in the measured distribution, the exact solution (if it exists) is usually wildly 
oscillating and useless (see ref. [||] for a good introduction to the subject Q ). 

For a number of reasons, in present high energy physics applications, the above ap- 
proach is usually replaced by a discrete Monte Carlo simulation of the measurement 
process, resulting directly in a system of linear equations for the underlying true dis- 
crete distribution. In this case all the above difficulties are aggravated by statistical and 
possibly systematic errors in the response matrix itself. 

In order to avoid these difficulties, it is sometimes advisable to fold the theoretically 
predicted true distribution with the estimated response matrix, and compare the thus 
folded theoretical spectrum with the measured one. This method is stable and may be 
useful in certain cases, but is clearly useless if a comparison between various experiments 
is required, or if the functional form of the distribution is unknown. 

The problem of unfolding has been studied in various forms, giving rise to a number of 
independent methods described in the literature. For instance, a method based on Bayes' 
theorem was proposed in 0, 0. The authors manage to avoid partly the inversion diffi- 
culties by using a non-linear iterative procedure, leading asymptotically to the unfolded 
distribution. 

Another way of overcoming the instability of unfolding is to use some kind of regu- 
larization condition, based on some a priori information about the solution. One can 
demand, for example, that the true solution has minimum curvature (i.e. is quite smooth 
[|l|]), or that it is strictly positive 0]. These methods usually allow the suppression of 
spurious oscillating components of the unfolded solution and often lead to satisfactory 
results, though concrete implementations happen to be quite lengthy and complicated. 

In this paper, we propose a new way of analyzing unfolding problems, based on the 
Singular Value Decomposition (SVD) of the response matrix. A simple and transparent 
2-dimensional example is used to explain and illustrate the reasons for the apparent 
instability of the problem, and a discrete analog of the minimum curvature condition is 
used to stabilize the unfolded solution. An unfolding procedure based on considerations 

^Many of the relevant topics are compiled also in the recent publication 0. 
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which are similar to those presented here has been described more than a decade ago in , 
and is still widely used in experimental analyses. While there are no significant differences 
between the foundations of the two procedures, we believe that our formulation allows 
one to obtain more reliable and precise results, being at the same time much simpler and 
easier to implement. 

Our approach is based on the extensive use of SVD, and results in a linear unfolding 
algorithm which is applicable to a wide range of problems. We derive also a number of 
important recommendations about the proper normalization of the response matrix and 
choice of the variables. Complete error propagation is implemented, and a vivid and 
reliable procedure for determining the optimal value of the regularization parameter is 
suggested. 

The paper is organized as follows. Sect. § contains our conventions for the notation 
used throughout the paper. The problem is formulated in Sect. |^. Sect. | is devoted 
to the Singular Value Decomposition, illustrated by a simple example of a 2 x 2 matrix. 
The benefits and pitfalls of rescaling equations and unknowns are analysed in Sect. |^. 
Regularization of the system and actual unfolding is performed in Sect. H, and the choice of 
the regularization parameter is discussed in Sect. |^. The step- by-step unfolding algorithm 
is presented in Sect. |] and is illustrated by two distinctly different examples in Sect. ||. 



Conclusions are drawn out in Sect. 10 



2 Notation 

We will adopt the following notation conventions, which will be used throughout the 
paper: 

• All one-dimensional histograms/vectors are denoted by small letters (e.g. b,z etc.). 

• All two-dimensional histograms/matrices are denoted by capital letters (e.g. A,X 
etc.). 

• The covariance matrix associated with a one-dimensional variable is denoted by the 
same letter in capital (i.e. the matrix Wij denotes the covariance matrix of the 
vector Wj). 

• No implicit summation is assumed over repeated indices; i.e. no repeated index is 
summed unless the summation is explicitly shown. 

• Upper index T stands for the transposed matrix, = A^i, so that the euclidean 
norm of a vector z equals V z'^z. 

• Upper index —1 denotes the inverse matrix, A^'^A = AA^^ = I, where I stands for 
the unit matrix, lik = 5ik- 

• All scalar variables are denoted by small greek letters (e.g. e, r etc.). 
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3 The Problem 



Let the distribution of a measwrerf observable be stored in a vector b of dimension n?,, where 
the ith coordinate of the vector contains the number of entries in the corresponding bin of 
the histogram. The measurement is affected by the finite experimental resolution and/or 
the limited acceptance of the detector, so that each event from the true distribution may 
find itself in a range of (not necessarily) adjacent bins, or nowhere at all. Let us assume, 
that we are able to simulate the measurement procedure of this observable (e.g. with 
Monte Carlo techniques). We generate the distribution a;'"' of dimension Ux, according to 
some idea of the underlying physical process, and perform our detector simulation. At 
this stage, every entry in a measured bin (i.e. every event) can be directly traced to its 
origin. This gives us a well defined system of linear relations between the simulated true 
and measured distributions: 

ix™ = 6'", (1) 

The Ub X Tlx matrix A is a probability matrix, which actually performs the simulated 
folding procedure. Now, with A and given, for any vector b obtained by a real 
measurement using the detector described by its response matrix A, one can attempt to 
find a corresponding unfolded true distribution x. It is well known that trying to solve 
the linear system of equations 

Ax^b (2) 

against x directly, using the exact inversion of the matrix, usually leads to completely 
unacceptable rapidly oscillating solutions. In the following, based on the Singular Value 
Decomposition (SVD) of the response matrix A, we analyse reasons for this behavior, 
locate the difficulty and propose a relatively simple and straightforward regularization 
method, which allows one to suppress spurious, quickly oscillating components of the 
solution, leaving only statistically significant terms. 

The above formulation of the problem may not seem general enough, but it clearly 
incorporates most cases of practical interest, is easily understood and interpreted, and is 
well-suited for the unfolding method described in this paper. Let us, however, express the 
discrete distributions x, b and the response matrix A in terms of the underlying continuous 
probability density functions. 

Let y*"^"*^ be the continuous true variable under consideration, whose variation range 
l^truc yl^^^} is divided into Ux bins with boundaries yf^^^j = 1, . . . , n^; — 1. Each com- 
ponent of the vector x is then calculated as an integral over the true distribution function 
A'(|/*™'^) in the appropriate range: 



Xj 



I'l^ dy'^^^Xiy^'^n, j^l,...,nx. (3) 

Analogously, let {yo -r- ym} be the variation range of the measured variable y, with bin 
boundaries yi,i = 1, . . . ,nb — I- Then the components of the vector b are appropriate 
integrals over the continuous distribution function B{y) : 

bi= dyB{y) , i = 1,...,^^. (4) 
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Let A{y, y*^"*^) be the detector response function, which maps the true distribution to the 
observed one, according to the convolution integrah 

B(y) = . (5) 

JyUue 

After this, the response matrix A can be defined as the ratio of two integrals: 

^ ly: 1 dy&-'^ dy'^'^'' y'''^") -^(2/*"') 

^l] — true ■ 



yj-l 



Each element A^j equals to the probability for an event generated in the true bin j to be 
found in measured bin i. 

In high energy physics applications the response function A{y, y*"^"^) is usually not 
known analytically. Instead, some sophisticated detector simulation techniques are used 
to determine the matrix directly as explained above. 

4 Singular Value Decomposition 

4.1 Definitions 

A singular value decomposition (SVD) of a real mxn matrix A is its factorization of the 
form 

A^USV^ (7) 

where U is an m x m orthogonal matrix, y is an n x n orthogonal matrix, while S is an 
mxn diagonal matrix with non-negative diagonal elements: 

UU^ = U^U = I, VV^ = V^V = I, (8) 

Sij^O for i^j, Su = Si>0. (9) 

The quantities Sj are called singular values of the matrix A, and columns of U and V are 
called the left and right singular vectors. 

The singular values contain very valuable information about the properties of the 
matrix. If, for example, A is itself orthogonal, all its singular values are equal to 1. On 
the contrary, a degenerate matrix will have at least one zero among its singular values. 
In fact, the rank of a matrix is the number of its non-zero singular values. If the matrix 
and/or the r.h.s. of a linear system is known with some level of uncertainty, and some 
singular values of the matrix are significantly smaller than others, the system may be 
difficult to solve even if formally the matrix has full rank. In many aspects such matrices 
behave like degenerate ones, and SVD suggests a method of treating such problems, which 
is common for small and exactly zero singular values. 

We will assume that the singular values Sj form a non-increasing sequence. This is eas- 
ily achieved by swapping pairs of singular values, swapping simultaneously corresponding 
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columns of U and V. We will assume also that m > n, which means that the number of 
bins in the measured histogram b should not be smaller than the number of bins in the 
unfolded histogram x. If necessary, one can just add rows of zeroes to the initial matrix. 

Comprehensive descriptions of SVD with many technical details and examples can be 
found in the literature (see, e.g., the excellent books H). One of the most attractive 
features of this procedure is that one does not really have to perform SVD by hand. A 
very efficient and transparent FORTRAN subroutine (called, not surprisingly, SVD) is 
present in the CERN program library. Some earlier implementations can be found in refs. 
i, § as well. 

Once the matrix is decomposed into the form (|^), its properties can be readily analyzed 
and it becomes very easy to manipulate, as illustrated in following subsections. This kind 
of analysis is extremely useful for ill-defined linear systems with almost (or even exactly) 
degenerate matrices, as it not only locates the difficulty, but can also suggest ways of 
overcoming it. 



4.2 A simple example 

Consider a very simple 2x2 example, which nevertheless incorporates most of the inter- 
esting aspects of the problem. 

Let the response matrix A have the form 



A=-j r: 7 , (10) 




with < e < 1 determining the "quality" of the detector: e = 1 means an ideal detector 
with the response matrix equal to unity, while small e <^ 1 corresponds to a poor detector, 
almost unable to distinguish the two bins. Note however, that the overall efficiency is 
100%, so that no event escapes detection (sum of elements in each column equals 1). 
The measurement process now is simulated by multiplying the matrix A over the true 
distribution x, resulting in the measured histogram b: 

Ax = b (11) 

With vector b measured and the response matrix ([10|) given, one can try to unfold the 
true distribution. 

Singular value decomposition of 2 x 2 matrices is very simple, involving just a single 
rotation from left and another from right. As the matrix (p!0| ) is symmetric, the orthogonal 
matrices U and V should coincide. SVD can be performed explicitly by hand, and one 
easily obtains: 

A = USV^ (12) 

with 



So, the two singular values of the matrix ([ToD are si = 1, S2 = e. 
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4.3 Solving a linear system using SVD 

Suppose now that the apparatus described by the matrix (|TUp has been used to measure 
numbers of events in a two-bin histogram 



b2 



(14) 



Let B be the corresponding covariance matrix, which is especially simple for purely sta- 
tistical errors in independent entries hi and 62 : 

In order to solve the system, let us use U, S and V from (|T3|) to rotate both the 
unknown vector x and the r.h.s. of the system b, 

^/2\xi-X2 J' V2 V fei - &2 y ^ ^ 

in order to form a diagonal system of equations 

Sz = d, z = S-^d, (17) 



where 



1 



1 



\ 

The unknown vector x can now be easily obtained by rotating z back: 



x = Vz = VS-'d = VS-'U^b = A-'b = i f 1 ) • 



Expression (|T9]) gives the exact solution of the system (p^ ) for whatever small but finite 
e. 

Formally, SVD of the matrix A means a decomposition of the r.h.s. b into a series 
of orthogonal and normalized functions of the discrete variable i = 1, . . . ,?T,f,. The basis 
is given by the columns of the matrix U, and the components of the vector d form the 
coefficients of this decomposition. Similarly, the vector of unknowns x is also decomposed 
into a series of ortho- normalized functions of the discrete variable j = 1, . . . ,nx, given 
by the columns of the matrix V, while the coefficients stored in the vector z are new 
unknowns. After performing these transformations, the initial system of equations ([Tl|) 
is reduced to the diagonal system ([T7|) which can be easily solved: the matrix S in (|T3|) 
is diagonal and can be inverted by just inverting the singular values. 

The inverse matrix A~^ exists for any e 7^ 0: 

A-^^vs-^u- = ^( ; + ' -\^')^-(] ])+-( \ -')■ (20) 

2e\-l + e 1 + e / 2\1 1/ 2e \ -1 1 ^ ^ 
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Note that the expressions (|I^) and ( PO] ) are exact, so that SVD and the subsequent 
analysis can be considered as just another method of solving well determined full rank 
linear systems, maybe a bit too complicated but quite capable. If all components of 
the rotated r.h.s. d are statistically significant and if neither of the singular values Sj 
of the matrix A is too small, the system (|TTD can be solved without any problem using 
any other method like, say, Gaussian elimination. But if e is small the problem becomes 
ill-determined, and when in addition the r.h.s. is affected by measurement errors, the 
exact solution usually does not make any sense. In this case clearly it is not the exact 
solution we are looking for, and conventional methods of solving linear systems do not 
work. Usually they cannot even detect the problem. 

To illustrate this, let us assume that the measured event numbers bi and 62 satisfy the 
following relation: 

ih-b2f<h+b2. (21) 

This means that the difference bi — 62 is not statistically significant, so that the first 
term in the exact solution ([T9|) is in fact a random number. But if e is small enough (in 



this case - smaller than 1/a/^i + ^2), this first term in both xi and X2 dominates over 
the well-behaved and statistically significant second term, leading to almost arbitrary 
and senseless results. This phenomenon can be easily understood: for very small e the 
apparatus is almost "blind" , and one can hardly expect to determine Xi and X2 separately, 
unless the errors in b are sufficiently small. 

4.4 Locating the difficulty 

In order to trace the problem back to its origin let us reconsider the rotated system (jn\j. 
Under the assumption ( pTD about the statistical accuracy of the data, the covariance 
matrix D of the rotated r.h.s. d is approximately diagonal, 

rTr>TT 1 / &1+62 bi-b2 \ ^1 f bi+b2 



2\bi-b2 bi + b2 J 2\ &i + &2 / ^ ^ 

so one has the following (almost) independent set of equations: 



l-zi = (61 + b2)/V2 ± v/(&i + &2)/2, (23) 

e.z2 = {bi-b2)/V2±^{br + b2)/2. (24) 

The first equation is good and gives a sensible result for zi, but the second one is not too 
useful even if e is large, as the r.h.s. is in fact a random number. However, for e close to 
unity it is at least harmless; it becomes dangerous for small e, when the random number 
in the r.h.s. gets strongly amplified after being divided by e, and after the rotation back 
to X according to (pJ|), gives a huge and senseless contribution to both components of 
X in the exact solution ([T9|) . This means that Z2 cannot and should not be determined 



from equation (|2^), because effectively the matrix has insufficient rank and the system 



is over-determined. However sensible equation (|2^) may look, it should be replaced by 
another equation 

■ ^2 = ± ^{bi + b2)/2, (25) 
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which contains as much information as (P^, but is much less harmful. The value of Z2 is 
completely arbitrary and can be determined only from some external condition. Note that 
each value of will lead to a new vector x. The easiest thing to do is to put Z2 = 0, which 
would lead to the "shortest" x: || x p= Xi + X2 = zf + z\^ as orthogonal transformations 
do not change the euclidean norm of a vector. Or, alternatively, one could choose the 
solution minimizing the variation (xi —x^)^. In our small example both these alternatives 
lead to the same "regularized" vector 

x- = ^fjV (26) 



Corresponding regularized covariance matrix has the form 

These two equations define the best reasonable estimate for an unfolded vector x, given 
the condition on the statistical accuracy of the data (|2l|) and a poor detector with e ^ 



5 Rescaling equations and normalizing unknowns 

Let us now get back to the full-scale problem defined in Sect. ^ and look at the initial 
linear system (Q) from another viewpoint. It represents the solution of the following least 
square problem: 

^($;]4x,-6,)' = min, (28) 
i=\ j=\ 

and is adequate if the equations are exact, or if the errors in the r.h.s. are identical. This 
is not generally the measurement errors in the vector h vary from bin to bin, 

and hence, different equations have different significance. In fact, one should consider a 
weighted least squares problem, where the following expression is being minimized: 

(^^^^ ) = nim, (29) 

where Afej is the error in hi. The general case of (p9[) reads 

{Ax-hfB~^{Ax-h)=mm, (30) 
where B is the covariance matrix of the measured vector h. 
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5.1 Normalization of the unknowns 

It is well known, that the exact solution of a well-determined linear system remains 
unchanged, if either the equations, or the unknowns, or both are rescaled. However, in 
the cases under consideration (where < ni, and some singular values are small) the 
minimization of (^) leads to an overdetermined system which should be solved in the 
least-squares sense. In this case any rescaling of equations and/or unknowns changes the 
singular values of the system and hence the solution as well. One can suggest various 
ways of rescaling, and some of them may lead to a serious improvement in the system 
behavior. One of our most important tasks is to optimize the system by rescaling it so 
that significant information is not suppressed while non-significant is not enhanced. 

One can, for instance, divide an unknown Xk everywhere in the system by a number 
A, multiplying simultaneously all corresponding coefficients An., i = 1, . . . , n^, by the same 
number A. Choosing various A's for different A;'s, one can obtain substantially different 
matrices. We argue that one particular choice of rescaling coefficients is the most suitable 
for our purposes, provided the probability matrix A is obtained using a Monte Carlo 
simulation procedure (see Sect. ^. 

Consider a new unknown vector wj = xj/x^^ which measures the deviation of x 
from the initial Monte Carlo input vector a;"\ If one now multiplies each column of the 
probability matrix Aij by the corresponding number of events generated in this bin x^^^, 
the system becomes 

rix 

J2 AjWj = bi, (31) 

where Aij is no longer the probability, but rather the actual number of events, which 
were generated in bin j and ended up in bin i 0. Obviously, x = a;™' corresponds to all 
components of the vector w being equal to 1, so that 6™' = J2]=iAij. At the end of the 
unfolding procedure, in order to obtain the correctly normalized unfolded solution xj, one 
has to multiply the unfolded vector w by x™': 

Xj = Wjxf\ j = l,...,n^. (32) 

Of course, if the number of generated events is the same for each bin, x™' = const, then 
the probability matrix A and the number-of-events matrix A coincide up to an overall 
constant factor which is completely irrelevant for our analysis. 

The systems (|]) and (|3T|) are completely equivalent for any shape of x™\ if the exact 
solution is required, but there are two serious reasons why, for the class of problems 
considered here, ([31| ) is much better suited. 

The first reason is fairly simple: if the initial Monte Carlo distribution x™' is physically 
motivated and is reasonably close to the one being unfolded, the unknown vector w 
should be smooth and should have small bin-to-bin variation, thus requiring less terms 
in the decomposition into orthogonal functions. This in turn means that more accurate 
unfolding should be possible, as fewer unknowns are required in order to obtain the 
unfolded solution. 

^ If defined through continuous probability distributions, this new matrix is equal to the numerator 
of eq. d). 
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The second reason is more technical and is connected to the singular value analysis. 
Whatever high statistics is generated in order to obtain the matrix A, some of its columns 
and/or rows may contain very few events, and some elements Aij may have just a single 
entry. In the probability matrix, these elements will contain the largest possible value 
of 1, unjustifiably giving a high weight to that particular equation and unknown, and 
the fact that this element has a 100 % error is completely ignored. At the same time, 
highly populated columns with statistically well-determined elements usually contain val- 
ues significantly smaller than 1, due to finite resolution and limited acceptance. This 
clearly makes the probability matrix A a bad choice. On the contrary, the elements of 
the number-of-event matrix A are large if the generated statistics is large, and vice versa, 
thus giving a larger weight to better determined equations and unknowns. 

The latter argument is in fact based on a quite formal consideration using perturbation 
theorems for the singular values of a matrix 0, and is an attempt to account for the 
intrinsic errors in the response matrix. It can be shown that for the cases of interest, 
initial Monte Carlo statistics should be at least one or two orders of magnitude higher 
than the statistics of the measured data. If so, the error in the unfolded solution based 
on ( [HT| ) is dominated by the measurement errors in the r.h.s. b. Note that this is not true 
for the system based on the probability matrix (0), where huge errors may be introduced 
because of the fact that scarcely populated areas of the response matrix have far larger 
weight than they deserve. 

5.2 Rescaling equations 

The very form of ([29|) clearly suggests the way of rescaling the equations: after dividing 
each equation by the corresponding error A6j one obtains a balanced system, where all 
the equations have equal weights. 

If B is not diagonal, equation rescaling becomes slightly more complicated but still 
straightforward. Being a covariance matrix, B should be symmetric and positive-definite, 
so its SVD yields: 

B = QRQ^, Rii = r\ ^ 0, Rij = for i ^ j, B'^ = QR-^Q^. (33) 

Substituting B~^ into (0) one sees that after the rotation and rescaling of both the r.h.s. 
b and the matrix A, 

Aij ^ ^ QimAfnj , 6j ^ ^ Qimbmi l"^^) 

m m 

the expression being minimized looks very simple again, 

{Aw - b)^{Aw -b) = min, (35) 

and the minimization leads to the following system: 

^AijWj = bi. (36) 
i 

The covariance matrix of the rescaled r.h.s., -B, is now explicitly made equal to the unit 
matrix /, and all the equations have equal importance. 
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6 Regularization and unfolding 



The transition from ( ]28|) to (^) changes the appearance of the system from (g) to (|36D. 
The singular values of the matrix are also changed, but the main problem with small 
singular values still remains. The exact solution of (^) will again most certainly lead to 
a rapidly oscillating distribution, which may have a smaller amplitude but is still useless. 
This spurious oscillatory component should be suppressed, using some a priori knowledge 
about the solution. Technically this can be achieved by adding the regularization or 
stabilization term to the expression to be minimized (see @, |l|, ^ and references therein): 



iAw-bf{Aw-b) + T- (CwYCw 



mm. 



(37) 



Here C is a matrix which defines the a priori condition on the solution, while the value 
of the regularization parameter r determines the relative weight of this condition. For 
example, the choice Cik = Sik would minimize the euclidean norm of the vector w, and if 
r is set to be infinitely large, this would result in a vector Wj = for any A and b. 

While the optimal value of r is very much problem-dependent and its determination 
is an important part of our procedure, the explicit form of the matrix C should be chosen 
from general considerations. The common belief is that the solution histogram w should 
be smooth, with small bin-to-bin variation. Let us define the "curvature" of the discrete 
distribution wj as the sum of the squares of its second derivatives: 



(38) 



Then the choice 
/ 

C = 

V 



-1 


1 








1 


-2 


1 








1 


-2 


1 



1 -2 
1 



(39) 



-1 / 



will suppress solutions w having large curvatures. Minimization of ( p7D leads to a new 
linear system, which has additional equations: 



A 



w 



b 




(40) 



This system is clearly over-determined, and one can apply SVD to the (n^, + rix) x 
matrix in the l.h.s. in order to solve it. This is possible, but would require calling SVD 
for each value of r. Fortunately, a more efficient method (called sometimes damped least 
squares ) can be suggested, which allows to express the solution of ( ^0|) for any r 
through the solution of the initial non-regularized problem corresponding to r = 0. The 
first step is to make the regularization term proportional to the unit matrix J: 



Cw 



b 




(41) 
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For r = the system (^) is equivalent to (PBD, if the inverse exists and can be safely 
calculated. The "second derivative" matrix (^), however, is apparently degenerate (every 
column and every row sums up to zero) so some measures should be taken to make the 
exact inversion possible. The easiest thing to do is to add a small diagonal component, 
Cik =^ Cik + ^ ■ Sik, with ^ large enough to make the inversion possible, but small enough 
not to change significantly the condition of minimum curvature. In most cases, ^ = 10~^ 
or 10~^ is a good choice. C now is a symmetric non-singular matrix. 



C 



( -1 + ^ 
1 





V 



1 

1 




1 

+ C 





1 

1 - 



2 + e 
1 



1 



(42) 



which can be inverted using standard techniques Q 

Let us now solve the system (|4lD with r = 0. First, one needs SVD to decompose the 
product of matrices AC~^: 



AC-^ = USV^. 



(43) 



Here, once again, U and V are orthogonal and S is diagonal, with non-increasing positive 
diagonal elements Sj. The solution now proceeds as in the 2x2 example of section 
Rotate both b and Cw to obtain a diagonal system: 



d = U^b, z = V^C w. 
The system now looks (and actually is) very simple: 



di 



1 , . . . , Hx ■ 



(44) 



(45) 



Note that because the covariance matrix of the r.h.s. b was made equal to the unit matrix, 
the orthogonality of U guarantees that the new rotated r.h.s. d also has a unit covariance 
matrix, i.e. the equations in ( ^5] ) are completely independent and have identical unit 
errors in their r.h.s. 

Obviously, solving (^5]) one obtains the exact solution of the non-regularized system: 



.(0) 



di 



— , w 



(0) = 



(46) 



and the true distribution x can be obtained by multiplying each Wi by the corresponding 
With r = there is no regularization, so this solution is as useless as it used to 



be. But the solution of the system (|40| ) with nonzero r can now be found very easily. 



•^One has, however, to be careful, as the matrix is too close to a singular one, and some of the standard 
routines may not work for small E.g., RFIN, RSINV and RSFINV have failed for ^ = 10"'*, while 
RINV was successful. So, it may be convenient to use SVD once again for this purpose: decompose 
C = UcScV^ and then calculate C'^ = VcS^^U^. 



12 



using the procedure explained in detail in (Chapter 25, Sect. 4). In short, introducing 
non-zero r is effectively equivalent to changing di by a regularized distribution: 

rff) = rf.^, (47) 
so that the solution of the rotated system becomes 

^M = C-VzW. (48) 

sf + r' 

One can now see how nonzero r regularizes the singularities due to small Sj's, effectively 
working as a cutoff for a low-pass filter, if Fourier-transform terminology is used. Indeed, 
Si is small when the index i is large, which in general corresponds to quickly oscillating 
singular vectors (i.e. columns of U and V) defining the new basis in the rotated space. 

Continuing the analogy with Fourier analysis, one can mention that the cutoff provided 
by the above regularization procedure happens to be quite smooth, thus avoiding specific 
quasi-periodic fluctuations of the solution known as the Gibbs phenomenon. 

The covariance matrices Z and W of the solutions ( |45| ) can now be easily calculated: 

= (49) 



l^W = C-Vz("V^C^-\ (50) 

Now in order to obtain the true unfolded distribution x and its covariance matrix X 
one has to multiply w and W by the initial Monte Carlo distribution x™': 

= xfn^, (51) 

X[l> = xTWl^^xT. (52) 

It is important to note that while (^l]) and (|5^) are regularized and as such depend 
on the value of r, the inverse of the covariance matrix X~^ (which should be used for any 
calculation involving the unfolded distribution (|5T|) ), is regular and readily calculable: 

^7.' = ™E4^^^- (53) 

In fact, X*-^-* defined by ([5^ ) is the effective pseudoinverse of the matrix (0). This means 
that while the equation 

XMX-^XW = (54) 

is valid as if X*^^-* were the true inverse of X~^, for a different ordering one has (see [0, H): 

\\X-^X^^^X-^ -X-^ \\<T. (55) 

It may be interesting to write out the exact inverse covariance matrix for the 2x2 
example of Sect. ^: 

The regularized covariance matrix (|26|) is actually the pseudoinverse of (|56|) . As expected, 
the latter is perfectly regular in the limit of zero e. 
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7 Error analysis and choice of r 



Very important and interesting information about the whole problem can be disclosed by 
plotting di, or, better, log|(ij| vs i. As mentioned in subsect. [4.3| , the i-th component 
of the vector d is the coefficient in the decomposition of the measured (and rescaled) 
histogram h in front of a basis function defined by the i-th column of the rotation matrix 
U . For reasonably smooth measured distributions, only the first few (say, k) terms of the 
decomposition are expected to be significant, while the contribution of quickly oscillating 
basis vectors corresponding to large values oi i > k should be compatible with zero, 
well within the statistical errors in di (which are equal to 1 for all i). So, on the plot 
one should see two separate patterns (see section ^ for a few illustrations): for small i, 
di should be statistically significant, \di\ ^ 1, falling gradually (usually exponentially) 
towards a gaussian-distributed random value for large i with the variance equal to 1 
and the mean close to zero (the absolute values of non-significant components \di\,i > k 
should have the average close to Xjl^ 0.28). The critical value i = k, after which 
(ij's are non-significant, determines the effective rank of the obtained system of equations. 
Usually it is clearly seen on the plot of log|(ii| vs i, as the value of i where the behavior 
of di changes from exponentially falling to a constant. 

The standard statistical tests can be used to check whether the last n^ — k components 
of di are compatible with the expected normal distribution with zero mean and unit 
variance. If this is not the case, then the errors in the measured data (or maybe the 
response matrix itself) are not estimated correctly, and we recommend finding out the 
reason of the discrepancy before proceeding with the unfolding. If, for example,the actual 
measurement errors in b are under(over)estimated, then the variance of di for i > k will be 
smaller (larger) than 1. Moreover, if some additional correlations exist in the measured 
data which are not accounted for in the covariance matrix B, then log|(ii| may steadily 
decrease for all i, though probably for i > A; the slope will be different. 

All this shows that the analysis of the plot of log|(ii| vs i is of great interest by itself, 
being able to reveal the actual level of understanding the measurement errors in the 
experiment described by the simulated matrix A. 

Anyway, if the number of statistically significant equations is determined to be equal 
to k, the regularization parameter r should be put equal to the square of the kth singular 
value Sk = Skk of the matrix AC"^, determined in (|43D : 



r = si (57) 

With r given by (|57D, the unfolded vector x, its covariance matrix X and the inverse of 
the latter are completely defined by corresponding equations (|5l|),(^) and (^) and 
can be easily calculated, forming the solution of the unfolding problem. 

Yet another (and maybe more convincing) way of determining r is to generate a test 
distribution which is close to the expected true one, but still significantly different from 
the initial Monte Carlo distribution a;'"\ Then one should simulate the measurement 
process by applying the response matrix to it, and add corresponding random statistical 
errors to the thus obtained "measured" distribution. The described unfolding procedure 
should be applied to the latter, and the best choice for r is the one giving the smallest 
between the test and the unfolded distributions (see our second example in Sect. ^. 
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8 The algorithm 



In this section we present the concise description of the complete unfolding algorithm. The 
algorithm is linear (i.e. contains no loops) and can be divided into three distinct parts: 
initialization, rescaling/rotation and actual unfolding. Each step includes references to 
relevant subsections and equations. 

• Initialization: 



1. Define the number of bins Ub and bin boundaries of the measured histogram b. 

2. Define the number of bins and bin boundaries, common for the initial Monte 
Carlo and the unfolded distribution x. 



3. Build the "second derivative" matrix C, according to eq. (^2]). 

4. Calculate the inverse (see Sect.|^). 

5. Generate the initial Monte Carlo histogram and simulate the detector 
response in terms of the two-dimensional nj, x Ux histogram A. Elements of A 
should contain actual numbers of events, rather than probabilities. 

6. Read and fill the measured distribution b and its covariance matrix B. 
Rescaling and rotation: 

1. Perform SVD of the covariance matrix B, according to eq. (|33|) . 

2. Rotate and rescale both the r.h.s. b and the matrix A, in order to make the 
covariance matrix of the r.h.s. equal to the unit matrix, according to eqs. (0). 

3. Calculate the inverse of the covariance matrix, X~^, of the unfolded vector x, 
according to eq. (|53|). 

4. Multiply matrices A and C^^ and perform SVD of the product, according to 
eq. dH). 

5. Calculate the rotated r.h.s. d, according to eq. (|4^) . 



Unfolding: 

1. Plot log|(ij| vs i and determine the effective rank k of the system (see Sect. |^. 

2. Put r = s|. 

3. Calculate z^^\ w^^\ Z^^\ W^^\ according to eqs. (^^. 



4. Calculate the unfolded distribution a;*^'^^ and its covariance matrix X'^'^\ accord- 
ing to eqs. ( ^Tj - p^ ). 

The vector x*-"^-* and matrices X^^^ and form the complete solution of the unfolding 
problem defined by the matrix A, simulated initial distribution x™, the measured vector 
b and its covariance matrix B. 
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9 Examples 



The use of the unfolding procedure described above is now illustrated with two examples. 

The first one is rather academic, and we have included it only because the same 
example was used in and [Q. The response function A{y,y^^^^) is given by 

A{y,y'n = [1 - 0.5(1 - i/*™^2)]{4exp[-50(i/ - y'^^' + O.Ody'^'''' ')']}. (58) 

Then the probability response matrix was built according to eq. (^ for 40 equidistant 
bins in the interval (0,2) for both y^'^'^^ and y. The matrix is presented in Fig. la. The 
true continuous distribution is taken to be 

4 4 2 

' 4 + (yt'^'i^ - 0.4)2 0.04 + (yt'^'i^ - 0.8)2 0.04 + (i/t^'^*' - 1.5)2 ' ^ ' 

After the convolution (^), the distribution B{ii) was discretized according to (H). Simulat- 
ing a counting experiment, a random normally distributed error was then added to each 
entry, assuming the overall initial statistics of 5000 events. The resulting "measured" 
distribution h is plotted in Fig. lb by a dotted line. The distortions caused by the mea- 
surement process can be seen by comparing the latter to the true distribution (pOf), shown 
by the solid curve in Fig. lb. 

The unfolding algorithm described above was then applied to the distribution h. Fig. Ic 
shows the plot of the rescaled and rotated r.h.s. vector d. The solid line corresponds to 
the actual measured histogram, and the horizontal dashed line shows the one standard 
deviation statistical error in dj, which is equal to 1 for each i. One can see that after 

1 = 10 the components di are clearly non-significant. The flatness of this distribution for 

2 > 10 and its apparent compatibility with the expected gaussian distribution with zero 
mean and unit variance, is in fact a test of the gaussian random number generator used 
to generate the errors in the measured histogram h. If we limit ourselves to less than 10 
equations, we lose some significant information. Namely, the choice k = 1 leaves effectively 
only one equation, and the obtained "unfolded" distribution x^^^ will be nothing else but 
a constant. On the contrary, by taking more than 10 equations one includes rapidly 
oscillating components with non-significant (and large) coefficients determined by the 
ratio di/ Si. In this particular example taking = 40 would result in a distribution x 
wildly oscillating with the amplitude of about 5000. 

The shape of the distribution d suggests that the effective rank k should be put equal 
to 10. The dashed line on Fig. Ic shows the regularized distribution d*^"^) calculated using 
(^), with r = s\q. It is interesting to compare this distribution with the similar one 
calculated for the exact true distribution (^) by the same procedure of rescaling and 
rotation, but without adding the random error (the dotted histogram in Fig. Ic). One 
can see that the regularized distribution is quite close to the true exact one. 

The obtained distribution c?^^^ is then used to calculate the unfolded histogram x^^-*, 
plotted in Fig. lb (data points). It should be compared to the true distribution (|5^) , shown 
by the smooth solid curve. Note that the error bars in x^^-* account only for the diagonal 
elements of the covariance matrix X, and thus underestimate the actual errors. The 
correlations between adjacent bins quite significant, so one should use the exact 

inverse of the covariance matrix for any kind of calculation involving the unfolded 
vector, and the regularized covariance matrix X'-^-' for the further error propagation. 
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Figure 1: a). The probability matrix A corresponding to the response function (0). b). 
The true distribution (^) (sohd curve) compared to the measured histogram b and the 
unfolded distribution x^'^^ for r = sIq. c). The absolute values of di (solid line) compared 
to the regularized r.h.s. (dashed line) and the one unaffected by the statistical fluctuations 
(dotted line). The horizontal line shows statistical errors in di, while the arrow indicates 
the boundary between the significant and non-significant equations, d). The deviation 
of the unfolded distribution from the true exact one (see text for details). 
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Fig. Id presents the difference between the exact distribution and the unfolded one, 
together with the bands showing one and two standard deviation statistical fluctuations in 
the true exact distribution. In this scale, one can still see some oscillations of the unfolded 
solution, but they are well balanced, distributed almost uniformly, and are confined inside 
the two standard deviation band of the true solution, thus indicating that the genuine 
error is about twice as large as the true statistical one would be, if the measurements 
were exact. The average over the 40 bins is equal to 0.9, meaning that the unfolded 
distribution is quite satisfactory. 

In the second example we unfold a simulated spectrum of the invariant mass of two 
pions, corresponding to the p(770)-meson mass region. An artificial two-dimensional 
histogram reflecting a possible detector behavior was generated as the detector response 
matrix A, and is shown in Fig. 2a. This time it is a number-of-event matrix resulting 
from some Monte Carlo simulation, as opposed to the probability matrix obtained by 
the integration of some analytical response function used in the previous example. The 
matrix is far from being diagonal, and the initial simulated distribution x™', shown by a 
dotted line in Fig. 2b, is hardly a constant. 

The "measured" distribution b was obtained in a way similar to the first example: a 
distribution a;**^^* was generated (solid line in Fig. 2b), which has a behavior distinctly 
different from a;™\ The measurement process was then simulated by the matrix multipli- 



cation: 



J:A^^-^ = b^, (60) 

and finally a random gaussian error was added to each entry bi, simulating statistical 
fluctuations. 

Rescaling and rotation results in a distribution di plotted in Fig. 2c. One sees that the 
effective rank of the system k is close to 9, so the parameter r should be set to the square 
of the 9th singular value of the matrix AC~^. The components di with z > 9 are clearly 
compatible with zero and have variances close to 1, thus confirming that the errors in the 
measured data are estimated correctly. 

As in the first example, the choice k = 1 would leave us effectively with only one 
equation, and the obtained "unfolded" distribution x*^^-* will be nothing else but the initial 
Monte Carlo distribution x™', shown by the dotted line in Fig. 2b. As for the solution of 
the non-regularized system with r = 0, it would include all non-significant components 
and would oscillate rapidly within the range ±(2^3)- 10^. This solution depends on the 
machine accuracy and obviously does not make any sense. 

The regularized distribution d^[^ is shown by a dashed line in Fig. 2c. It is to be 
compared with the exact distribution d*^*^* corresponding to the vector (|60D after the same 
procedure of rescaling and rotation, but without the random error added. One can see 
that the regularized vector is much closer to the true exact one for large z > 9. 

The resulting unfolded histogram x^p is shown by the data points in Fig. 2b. The 
difference of the unfolded and the exact test distributions is presented in Fig. 2d, together 
with one and two standard deviation bands describing the statistical errors in the test 
vector. Here, too, the error bars show just the diagonal elements of the error matrix, 
which in fact contains quite strong bin-to-bin correlations. The agreement is very good 
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Figure 2: a). The simulated number-of-events response matrix A. b). The true test 
distribution x^^^ (sohd hne) compared to the unfolded one (data points). The dashed 
histogram corresponds to the initial distribution according to which the response ma- 
trix was generated, c) . The absolute values of di (solid line) compared to the regularized 
r.h.s. (dashed line) and the one unaffected by the statistical fluctuations (dotted line). 
The horizontal line shows statistical errors in di, while the arrow indicates the boundary 
between the significant and non-significant equations, d) . The deviation of the unfolded 
distribution from the true exact one (see text for details). 
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indeed, especially if one considers the four orders of magnitude variation range of the test 
distribution. 

10 Conclusion 

The data unfolding method developed in this paper can be used in a wide range of 
applications, but is especially well-suited for high energy physics, where the response 
matrix is usually estimated by a Monte Carlo simulation of the measurement process, 
using some physically motivated initial distribution of the quantity under consideration. 

The extensive use of a very versatile and flexible tool — the Singular Value Decompo- 
sition of a matrix — allowed us to derive a concise loop-free algorithm for data unfolding. 
Our choice of the regularization term results in smooth unfolded distributions which have 
the smallest possible curvature among the solutions satisfying the initial linear system 
in the least squares sense. This is achieved by suppressing the coefficients of high-order 
rapidly oscillating components of the solution. The suppression factor depends on both 
the statistical significance of the equation and the magnitude of the corresponding singu- 
lar value. The regularized solution contains as much statistically significant information 
from the measured data as possible, simultaneously suppressing spurious, wildly oscillat- 
ing components. This suppression is a natural result of the solution process, when the 
initial linear system is expanded to incorporate the minimum curvature condition. 

The method used to solve the regularized system is extremely simple and reliable. An 
easy and straightforward way to determine the optimal value of the regularization param- 
eter is suggested, which allows at the same time to test whether the quoted measurement 
errors are adequate. Note that the presented solution method is quite fiexible and can be 
used with other choices of the regularization term as well. 

Obviously, as the number of statistically independent data points is usually smaller 
(and sometimes much smaller) than the number of bins in the unfolded histogram, the 
latter will probably have significant bin-to-bin correlations. In our approach, full propa- 
gation of errors from the measured distribution to the unfolded one is implemented, and 
both the covariance matrix of the unfolded solution and its inverse are easily calculated. 
This allows one to perform further error propagation and parameter fitting without any 
problem, so, contrary to the viewpoint expressed in f^, we do not think that one should 
use fewer bins and custom bin boundaries for the unfolded histogram, in order to make 
the covariance matrix diagonal. 

Obviously, curvature minimization introduces some systematic bias into the unfolded 
distribution, so the method will lead to acceptable results only if the true solution is indeed 
smooth, if the probability response matrix is used. However, when one uses the number- 
of-events response matrix, the condition of minimum curvature means that the deviation 
of the expected distribution from the initial Monte Carlo one should be smooth enough. 
This clearly allows one to use our procedure in cases when the measured distribution has 
some structure and/or a wide variation range, provided the initial Monte-Carlo has a 
similar behavior. If this is the case, then even for the small effective rank of the system, 
when the unfolded distribution happens to be quite close to the initial Monte Carlo, the 
former (in conjunction with the error matrix and its inverse) is still expected to give a 
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usable solution of the problem. 

Though we have tried to present comprehensive explanations of each step of the 
method, one should be able to use the algorithm formulated in Sect. ^ without hav- 
ing to understand in full all the details of the underlying mathematics. Note however, 
that the mathematics involved in this paper is still much simpler than that required by 
other regularization methods (e.g. [|T], |]) because the most difficult tasks are successfully 
dealt with by the SVD procedure. 

In short, the method presented in this paper allows one to unfold data obtained by 
any measurement process, if the response matrix of the detector is known. The use of 
the number-of-events matrix (and not the probability matrix) allows one to minimize 
additional uncertainties due to statistical fluctuations in the matrix itself, when the latter 
is obtained using the Monte Carlo simulation process. The concise linear algorithm results 
in a straightforward and simple implementation with full error propagation, including the 
complete covariance matrix and its inverse. The method is suitable for use in a wide range 
of problems, and can be generalized to incorporate multi-dimensional distributions. 
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